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Abstract 



We present a method for compiling pattern matching on lazy languages based on previous work 
by Laville and Huet-Levy. It consists of coding ambiguous linear sets of patterns using "Term 
Decomposition," and producing non ambiguous sets over terms with structural constraints on 
variables. The method can also be applied to strict languages giving a match algorithm that 
includes only unavoidable tests when such an algorithm exists. 



Resume 



Nous presentons une methode de compilation de 1' appel par filtrage pour les langages paresseux 
dans le prolongement du travail de Laville et Huet-Levy. Nous transformons des ensembles 
ambigus de motifs lineaires a l'aide de la "Decomposition des Termes" pour produire des 
ensembles non-ambigus de termes dont les variables sont munies de contraintes structurelles. 
Cette methode peut aussi etre appliquee a des langages stricts et donne un algorithme de 
filtrage ne necessitant aucun travail inutile quand un tel filtrage existe. 



Keywords 



Compilation, Call by Pattern Matching, Term Decomposition, Sequentiality 



Acknowledgements 

We are grateful to Jean- Jacques Levy who made numerous suggestions on the presentation of 
this work. We would like also to acknowledge helpful comments made by Gerard Huet and 
Hassan Ait-Kaci. 



Contents 



1 Introduction 1 

1.1 Constrained Terms 1 

1 .2 Pattern Matching 2 

1.3 Compilation 2 

2 Terms and Constraints 5 

2.1 Terms 5 

2.2 Constraints 6 

2.3 Constraint Simplification 8 

2.4 Constrained Terms 9 

2.5 Substitution 10 

3 Term Decomposition 14 

3.1 Decomposition 14 

3.2 Decomposition Procedure 15 

3.3 Decomposition Normalization 16 

4 Pattern Matching 17 
4.1 Sequentiality 19 

5 Examples 22 

6 Conclusion 24 
References 26 



v 



Compiling Pattern Matching by Term Decomposition 



1 



1 Introduction 

We are interested in compiling pattern matching in case of partially evaluated terms in order 
to do only necessary computations for the match. This is a kind of lazy computation over 
partially defined terms. In 1979 G. Huet and J-J. Levy [5] defined a method for constructing 
match trees for non-ambiguous linear term rewriting systems. However, the application of 
their results to the problem of compiling pattern matching as in the ML language was not clear 
until 1988 when A. Laville [6, 7] showed that it is possible to use their method for ambiguous 
term rewriting systems with a given priority on rules. This priority is necessary to decide 
which rule has to be used in case of conflict. Laville designed a new match predicate that takes 
into account the priority when building the match trees. When this construction is successful, 
the leaves of the match tree form a Minimal Extended Set of Patterns equivalent (from the 
match point of view) to the original system in the case of finite signatures. 

Our method is to code ambiguous ordered term rewriting systems into non-ambiguous 
ones over constrained terms. We replace the priority rule between left parts of the rewriting 
system by constraints over terms. Therefore the match predicate is that of Huet and Levy 
but over constrained terms. Their results are then extended to these terms. Furthermore, as 
a result of the computation of the non-ambiguous set of terms of the system, we also obtain 
a characterization of the set of partially evaluated terms for which every matching algorithm 
will loop. We call it the strict set of the system. Although some algorithms may loop on other 
terms, an optimal algorithm, if it exists, will only loop on the strict set. 

1.1 Constrained Terms 

A term with variables is a representation of all ground terms obtained by replacing its 
variables by terms with no variables. A subset of a given set can be defined either by a 
description of its elements or as the complement of another subset. For example, a variable x 
represents the set of all the ground terms and F(A, y) a subset of x. We can partition the set 
x into three subsets. First, the set of instances of F(A, y). Then, the set of terms for which 
we can decide that they are not instances of F(A, y). Finally, the set that contains partially 
evaluated terms of the form F(. . .) whose first argument cannot be evaluated (its computation 
loops) denoted F(»,y) as well as the non-evaluated term that we denote •. With the set 
notation, this partition is written: 

{x} = {F(A, y)} U {x\xjkF(A,y)} U {;F(;y)} 

Following this idea, we define the concept of constrained terms and give some of their algebraic 
properties. 

With this formalism an ordered ambiguous set of terms can be transformed into a non- 
ambiguous set of constrained terms. For instance, the set of terms F(A,y), F(x,y) is 
ambiguous as the term F(A, B) is an instance of both of them. The set of constrained terms 
F(A, y), {F(x,y) \ x^A}, {x \ x^F(. . .)}, {•, F(», y)} is not ambiguous. Now a given term is 
an instance of exactly one of these constrained terms. 
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1 .2 Pattern Matching 



Call by pattern matching is one of the main features of the ML language [4, 10] and was 
inherited from HOPE [2]. It may be viewed as a generalization of the "case" statement of 
imperative languages. In ML, one can define one's own structural types and very easily write 
operations over them. We will introduce call by pattern matching by extending the Pascal 
definitions of "enumerated types" 

In Pascal, it is possible to use the case statement to select among different cases by the value 
of an expression of an "enumerated type": 



type T = (Cl,...,Cn); case x of 

var x:T; CI : <<expl>> 

I . . . 

| Ci : <<expi>> 

| otherwise : <<exp>> 

The natural extension of this construct is to allow matching not only constant values but 
more general data structures as in the following example in the language ML where there are 
two cases in the definition of the type of trees: Leaf to represent the leaves of trees and the 
constructor Tree for the other nodes. 



type Tree = case 
Leaf of number 
I Tree of number*tree*tree; | 

I 
I 



tree of 

Leaf (3) — > <<expl>> 
Tree (_, Leaf (_) ,_) — > <<exp2>> 
Tree (_, Tree (_,_,_) ,_) — > <<exp3>> 
otherwise — > <<exp4>> 



val tree: Tree = Tree ( 3 , Leaf ( 2 ) , Tree ( 4 , Leaf ( 7 ) , Leaf ( 9 ) ) ) ; 

In this example the value of the variable "tree" will match the second and the fourth cases 
but taking the first one as the priority holder, the expression <<exp2>> will be executed. 

1.3 Compilation 

If patterns are non-ambiguous, there is a decision procedure due to Huet and Levy [5] that 
determines whether an optimal match exists for a set of patterns and, in the case where such a 
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match exists, produces a search tree that allows to compile the match problem. This method 
can be illustrated with the following example. 

Suppose that we want to match pairs of terms (x, y) of Booleans by the set of patterns 
(true, true), (_, false) and ( false, true ) . We choose to look first at a column 
that only contains constants (in our example the second one), and divide the patterns by the 
constants appearing in that column. The result is a transformed program in which there is 
always one column in the pattern to look at. 



case (x, y ) of 

( true , 
I ( _ 

I ( false , 



true 

false 

true 



case y of 

true : (case x of 
true 
false 

I false — > 2 



1 

3) 



There are some sets of patterns, namely non-sequential patterns, for which the method of [5] 
fails. The typical example was proposed by Berry [1]: 

(A,A,_), (B,_,A), (_,B,B) 



In this example the patterns are non-ambiguous but there is no column in which we can 
make the decision. So if we want to avoid looping in the evaluation of this match, a parallel 
mechanism that inspects simultaneously all three columns is necessary. 

But the restriction that patterns must be non-ambiguous is a burden to the programmer 
especially when the program contains data structures with many different constructors. This 
is one of the reasons why most programming languages that feature call by pattern matching 
accept ambiguities and impose a priority rule between different patterns. In this paper we do 
not discuss assignment of priorities. In ML and other programming languages, for instance, 
the order of patterns in the text is used and the programmer has to write the more specific cases 
before the more general ones. Another possibility to automatically assign higher priority to 
specific cases and still use textual ordering for those that are compatible. Both priority rules 
have the same expressive power as any set of patterns can be ordered to work exactly in the 
same way with any of them. 

When ambiguities are allowed and a priority rule is imposed, the method of [5] does not 
apply directly, as shown below: 
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case ( x , y ) of 

( _ , true ) -> 1 

I ( false , _ ) -> 2 

I ( _ , _ ) 3 

Now take any pair (x,y), if y=true then the pair (x,y) matches the first case. 
Otherwise, if x=f alse, it matches the second one. Finally in every other case, it matches the 
third one. Remark that it is slightly subtle to find the set of pairs which match this three cases. 
The first case corresponds to any pair (x, true) , the second one to (false, false) and 
the third one to (true, false). In this example, where both the first and the second column 
only have one constant, the method of [5] does not apply directly. It can be adapted (as in [6]) 
by imposing priorities to the patterns to make them non-ambiguous. 

Our approach in this work is to use the data structure of constrained terms to represent 
sets of patterns ordered by priorities such that the disambiguating rule becomes part of the 
representation. In the previous example, the set of constrained terms that represents the match 
problem is: 



(_, true), (false, ^true) and (#alse, Arue) 



in which iC represents any value different from C and the strict set is: 



(_,•), («,*rue) 



Notice that an algorithm that evaluates from left to right will also loop on the term ( • , true ) 
while an algorithm that evaluates this pair from right to left will not. With the non-ambiguous 
set of terms given above it becomes possible to apply the method of [5] and choose the second 
column as the one to look at first (where is considered as a constant). 

In [6], a program for the compilation of patterns with priorities was written in CAML [10]. 
The construction of the new set of non-ambiguous patterns is embedded in the control of the 
program. In our work, the transformation from ambiguous to non-ambiguous patterns will 
be achieved at the level of source programs. This makes the program transformation explicit 
and independent of the pattern matching process. Furthermore, the algorithm presented here 
produces very compact representations, especially in the matching of terms with arbitrarily 
large signature. 
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2 Terms and Constraints 
2.1 Terms 

Let X be a denumerable set of variables and £ a set containing function symbols and an 
additional symbol •. To each function symbol is associated its arity. For our purpose the 
language of terms T(X, X) is defined by: 

terms: t ::- F(t\, . . . ,t n ) \ x \ • 

where the function symbol F is a symbol of £ of arity n, the variable x is in X and ti , . . . , t n 
are terms. The set of terms without variables is the set T(£) of ground terms. The set of 
partially evaluated terms is the set T(L — {•}, X). A linear term is a term in which all the 
variables are different. 

The set 0(t) of occurrences of a term t is recursively defined by: 
e e 0(t) 

i.u e 0(F(t u ..., t n )) if ug 0(ti) (1 <i<n) 
if ue 0(t) the subterm t/u of t is defined by: 

t/e = t 
F(ti,...,t n )/i.u = ti/u 

Definition 1 

1. A (ground) substitution a, is a mapping over terms defined by replacing a finite set of 
variables by (ground) terms which transforms any term t into a(t). The term a(t) is 
called an instance oft. 

2. The quasi-ordering ■< over terms is defined byt<t' if there exists a substitution a such 
that a(t) — t' and t is said to be a prefix oft'. Its extension to substitutions is defined by 
a < a' if and only if there exists a substitution rj such that a' - rj o a. Thus a is said to 
be more general than a'. 

3. Two terms t and t' are comparable if either t<t'ort'<t and compatible or unifiable 
if there exists a substitution a, such that a(t) = cr(f'), in which case a is a unifier oft 
and t'. 

4. The least upper bound of two terms t and t', denoted tut' is the smallest term that has 
both t and t' as prefixes. The unifier that produces this bound if it exists is called the 
most general unifier (m.g.u.). The greatest lower bound of two terms t and t', denoted 
t\l t' is the greatest prefix oft and t' (with respect to <). 
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In the following it will be convenient to identify a term with the set of its ground 
instances. For example, let £ = {F, A,B,»}. The term t = F(x,y) represents the set 
{F(A, A), F(B, A), F(», A), F(F(A, A), A), . . .}. Any ground term t represents the set {t}. 
Two incompatible terms represent disjoint sets of ground terms. 

The relation < is the opposite of the set inclusion of the ground instances. A substitution 
can be seen as an operation that allows to build terms from the root to the leaves. The special 
term • will denote terms that cannot be built as for instance those whose construction does not 
terminate; a substitution a such that a(x) = • can be assimilated to a construction that never 
ends. 

Now we want to represent more precisely sets of terms; for instance the subset of all terms 
which are not instances of F(x,y). Thus we classify all the terms in three parts: those that 
are instances of F(x,y), the term • which represents terms that cannot be built and those that 
are instances of any G(. . .) with G^F and G^». The last part represents terms for which 
we do know that they are not instances of F(x , y) while the second part represents terms for 
which we cannot say anything. The subset of F(x,y) of all terms different from F(A, B) is 
the set of ground terms {F(a(x), a(y)) \ a(x) Jk {A} or a(y) jk {B}}. With the finite signature 
E = {F, A,B,»}, this set is represented as the union of F(x,A), F(x, F(y, z)), F(B,x) 
and F(F(x,y), z). This representation depends on the number of elements of the signature, 
for instance using Z' = {F, A, B, C, •} the representation as union of terms has two extra 
components: F(x, C) and F(C, y). With an infinite signature it is not possible to represent 
this set as a finite union of instances of terms. Notice that the two terms F(»,y) and F(x, •), 
which are instances of F(x,y), do not belong to {F(a(x), a(y)) \ a(x) /k {A} or a(y) jk {B}}. 
It is more concise to represent those sets by terms with variables with constraints. This can be 
illustrated as follows: 

1. {t = F(x,y) such that frF(A, A)} = F(x £ {A},y) UF(x,y/£ {B}) 

2. F(B, A) U F(B, F(x,y)) U F(F(x,y), A) U F(F(x,y), F(z, t)) = F(xJk{A},y/k {B}) 

3. F(F(x,y), A) U F(F(x,y), F(z, t)) = F(xjk{A, B},y/k{B}) 

We will now formally introduce the notions of constraint and of constrained term in order 
to represent such sets of ground terms. Roughly speaking, a constrained term is composed 
of a term and a constraint which is a predicate over the variables of the term. This predicate 
restricts the possible instances of the variables in subsequent substitutions as we will see below. 

2.2 Constraints 

Definition 2 Let t and t' be two terms. The quasi-ordering C between two terms is defined 
by: t C t' if and only if there exists a term t" such that t Co t" ^ t' where C 0 is characterized 
by the following rules: 

Let x and y be two variables, F a symbol in £ and t a term: 
x Eo y 



January 1990 



Digital PRL 



Compiling Pattern Matching by Term Decomposition 



7 



F(h , . . . , t n ) C 0 F(t[,..., t' n ) if and only if for every i(\ < i < n) i; Co 
t c 0 • 

Lemma 1 let t be a linear term and t' a term, t C t' if and only if there exist t" such that 
t < t" C 0 t' 

Proof: Let U = {ug 0(t)nO(t') \ t'/u = •} andi" = t'[u <- t/u | ug U]. Clearly t < t" C 0 t'. 

When the name of variables in a term is not important (that will be the case for linear terms 
in the following) we will use the symbol Q. instead of the names of variables. 

The greatest lower bound of two terms is equal to the one for the prefix ordering. The least 
upper bound can be characterized by the following rules: let I be an term, x a variable and F 
and G two different symbols in Z. 

x\J = I 
lUx = I 
F(...)UG(...) = . 
F(h l n )UF(l[ ,...,l' n ) = F(h U![ l n Ul' n ) 

The relation C is used to define predicates over terms that we call constraints. To each set 
L of linear terms is associated a predicate over terms denoted tOL which is true if and only if 
l^t for every I in L. These constraints are said to be structural as they are specific to the term 
structure only as opposed to arbitrary predicates. 

Definition 3 (Constraint) A constraint is recursively defined as either an atomic predicate 
tOL or the disjunction of two constraints or the conjunction of two constraints. 



Constraint: P ::= termO Set of linear terms 
| P V P 
| PAP 

When (tOL) we say that L is a constraint over t. 

The truth value of compound constraints is obtained by standard interpretation of the 
logical connectives. We write |= P if and only if the predicate P is true. By using the 
usual equivalences on connectives or (V) and and (A), a constraint can always be written in 
disjunctive normal form (as a disjunction of conjunctions of atomic constraints). 

Definition 4 (Substitution over a Constraint) Let a be a substitution, tOL an atomic con- 
straint, P[ , Pj two constraints. By definition, 

a(tOL) = a(t)OL, a(Pi V P 2 ) = <r(Pi) V <t(P 2 ) and <t(Pi A P 2 ) = <r(Pi) A <7(P 2 ). 
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A substitution a satisfies a constraint P if and only if |= <r(P). 

For instance, \£(F(A, B)0{F(A, Q)}) and |= F(A, B)0{F(A, •)}). 

Two constraints P and P' are said to be equivalent, denoted P=P', if and only if, the sets of 
substitutions satisfying P and P' are the same. A constraint P implies a constraint Q , denoted 
P =>• Q , if and only if every substitution satisfying P also satisfies Q . 

Remarks: For every term t and for every substitution 77, ^{rj{t)<>{Q., . . .}) and |= (r)(t)0{}). 
In what follows T and T will denote respectively tO{Cl} and tO{}. Notice that when ^P 
then for every substitution 77, \£(rj(P)) and thus P=T . Also if there is a substitution that 
satisfies P (noted |= P), P is not always equivalent to T as there may be some substitutions 
that do not satisfy P. 

2.3 Constraint Simplification 

We can prove by induction on the structure of I that tO{l} V tO{l'}=tO{l\Jl'} and that 
if I C I' then tO{l} A tO{l'}=tO{l}. From those properties we deduce the following 
simplification rules that associate to each constraint an equivalent normal form where the term 
in each atomic constraint is a variable: 

Let t, t\ , . . . , t n be terms, t[ , . . . , t' n be linear terms, L be a set of linear terms and x a variable. 

F(t u ...,t n )0{F(t[,...,t' n )}UL 
FQO{FQ} U L 
F(t u ...,t n )0{G(...)}UL 
tO{} 
tO{£i\ U L 
tO{l,a(l)}U L 
tOL A tOL' 
AitO{k} V AjtO{/j-} 

These simplification rules define a function, denoted simpl, which transforms a constraint P 
into T , T, or an equivalent constraint over variables. Notice that these rules are different 
from those of disequations in [3] because we deal explicitly with the symbol • that represents 
non-evaluable terms. For example x<>{A} V x<>{B} ^x<>{A} V AO{B} because • does not 
satisfy the left part while the right part is equivalent to T. 

The restriction of a simplified constraint P to a given set of variables V is simpl(P') where 
P' is the constraint obtained when replacing by T all of the atomic constraints of the form 
xOL x such that x £ V. For instance the constraint (xO{F(£l, £1)} A y , 0{A}) restricted to {x} 
is (xO{F(£l, D.)} A T)=xO{F(D., Q)}; the restriction of (xO{F(Q, Q)} V yO{A}) to {x} is 
T. Notice that when |= P we have |= P' for any restriction P' of P. 

The following lemma shows the relation between the constraints and the prefix ordering. 



s (V 1 <i< n tiO{t' i })AF(t 1 ,...,t ri )OL 

- T 

= F(t u ...,t n )OL 

= T 

= T 

= fO{Z}U£ 

= tOLUL' 

= Ai, J -tO{/ i U/'-} 
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Lemma 2 

1. Let a be a substitution satisfying a constraint P = tOL. Every substitution p more 
general than a satisfies P. 

2. Let t and I be two terms. Ift C I and t^l then tO{l}=T. 

3. Let t be a term and a, p two substitutions. tO{a(t)} =>- tO{p(t)} if and only if 
°"(*) E p(t). More generally, Ai^H°*(*)} =^ t^{p(t)} if and only if there exists <j{ 
such that <7j(t) C p(t) 

4. Let t and I be two terms and L a set of terms. Either tO{l}=T or there exists a 
substitution a such that fO{7}=fO{<r(f)}. More generally for any predicate tOZ there 
exists a possibly empty set of terms L 1 = . . . , l n } such that t < li and tOL=tOL'. 

Proof: These properties are proved by induction on the structure of terms. 

1. Let us suppose that P = tOL. If £2e L or t = •, the left part of the implication is never 
satisfied. The only property to prove is that for every term every I = F(li , . . . , l n ) 
and substitution a, \£tO{l} implies \£a(t)0{l}. The hypothesis implies t = F{t\ ,t n ) 
and for every i (1 < i < n), ^iiO{7;}. By induction ^<r(f ; )0{Z;} and by definition 
\£a(t)0{l}. The proof easily extends to arbitrary constraints but the extension is not 
necessary because, as we will see below, any predicate is equivalent to one of the form 
tOL. 

2. If t C I and t-^l then there exists an occurrence u of both terms such that l/u = • and 
t/u is not a variable and is different from •. Thus for every substitution rj the subterm 
rj(t)/u is also different from • and is not a variable which implies l^rj(t). 

3. If tO{a(t)} =>■ tO{p(t)}, p does not satisfy tO{a(t)} because it does not satisfy 
tO{p(t)} and thus a(t) C p(t). Conversely, for every substitution 77 satisfying tO{a(t)}, 
cr{t)%r]{t). As a(t) C p(t), p(t)\£rj(t) and we conclude that 77 satisfies tO{p(t)}. The 
generalization is made by simple manipulation of logical connectives. 

4. We remark that tO{l}=tO{l} V tO{t}=tO{l]Jt}. As t C lut, by part (2) either 
tO{l\Jt}=T ort< lUt that proves the property. The generalization is made by simple 
manipulation of logical connectives. 

2.4 Constrained Terms 

Definition 5 (Constrained Term) Let t be a term and P a constraint. A constrained term 
{t\P} is the set of ground instances oft satisfying the restriction P' of P to the variables oft. 

constrained terms : T ::= {t\P} 
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In what follows we will call t the pure part of T and substitutions over pure terms will be 
called pure substitutions. The set of occurrences of T is that of its pure part. The subterms of 
{t\P} are of the form {f'|P} where t' is a (pure) subterm of t. Notice that by definition, the 
constraint part of a subterm is restricted to the variables occurring in its pure part. When ^P 
the term {t\P} represents the empty set of terms that we note 0. The following properties on 
the sets represented by constrained terms are easy to check: 

1. When P=Q, the terms {t\P} and {t\Q} are the same. 

2. {t\P V Q} = {t\P} U {t\Q} 

3. {t\P A Q} = {t\P} n {t\Q} 

Any constraint P is equivalent to its disjunctive normal form Vi P; where each Pi is a 
conjunction of atomic constraints over variables and thus {t\P} = \Ji{t\Pi}. This gives a 
practical representation of constrained terms which is very close to their implementation. 

Example: Let T = {F(x,y)\P} where P = F(x, y)0{F(A, B)} A yO{C} A zO{A}. As 
the variable z does not appear in T, the restriction of P is F(x, y)0{F(A, B)} A yO{C} and 
T = Ti U T 2 where T x = {F(x, y)\xO{A} A yO{C}} and T 2 = {F(x,y)\yO{B , C}}. Notice 
that in general the terms Tj are not disjoint as in this example where the term F(B, A) belongs 
to both Ti and T 2 . 

Even in the case of an infinite alphabet, term representation with constraints can be finite 
which is not the case with the classical representation. 

2.5 Substitution 

Definition 6 Let a be a substitution and Q a constraint. The constrained substitution 
a — (a, Q) is the mapping over constrained terms defined by a({t\P}) — {a(t)\a(P) A Q}. 
When |= <t(P) AQ,(T is admissible for {t\P}. 

Notice that when \£Q the substitution a = (a, Q) maps every term to the term 0. 

We compose two constrained substitutions <7f = (a\, Q\) and ctJ = (<t 2 , Qi) as usual and 
check easily that ctJ o <7f = (<t 2 o cr\ , simpl(Q2 A a 2 (Q i))). 

The definition of quasi-ordering (<) is extended to constrained terms using constrained 
substitutions instead of pure substitutions. The least upper bound (U) is defined with respect 
to < . The notion of unifier of two terms T and T' is also extended but the unifier has to be 
admissible for both terms to avoid the empty term as the only common instance of T and T' . 
The equality (=) of two terms T and T' is defined by: T\ = T 2 if and only if T\ < T 2 and 
T 2 < T\ . We give now a characterization of these concepts in order to compute separately the 
pure part and the constraint. 

Lemma 3 Let T\ = {fi|Pi} andT 2 = {t 2 \P 2 } be two constrained terms. 
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1. T\ ■< Ti if and only if there exists a substitution a such that (r{t\) = ti and P2 => c(Pi). 

2. a - (a, Q) unifies T\ and T2 if and only if a is admissible for T\ and T%, - crfo) 
and <r(P 1 ) A (3=<t(P 2 ) A Q. As usual, we say that two terms T\ and T2 are unifiable or 
compatible, denoted T"i f T2, if and only if there exists a unifier for them. 

3. T\ U T2 = {ti U t2\o-(Pi A P2)} where a is the most general unifier oft\ and £2- LI is 
not defined when t[ and £2 are n °t unifiable or when ^<r(P 1 A P2). The substitution 
a = (a, <t(P 1 A P2)) is a principal unifier for T\ and T2. 

4. Let T = {i|iO{7}} and M = {m|T} be two constrained terms (in fact the last one is a 
pure term). T < M if and only ift < m andmjl. 



1. T\ ■< T2 if and only if there exists a = (a, Q) such that a(t[) = t2 andP2=a(P[)AQ. This 
implies P 2 ^> <r(P 1 ). Conversely, if there exists <r such that a(ti) = t2 and P2=<r(P 1 ), as 
P 2 =(r(Pi) A P 2 , a = (a, P 2 ) satisfies a(Ti) = T 2 . 

2. This is a simple consequence of the definition of equality. 

3. Obviously {ti U t 2 \a(P 1 A P 2 )} is an upper bound of T { and T 2 . Now, let T = {t\P} 
be an upper bound of T\ and T2- By definition of the least upper bound of pure terms, 
there exists a substitution p such that p(t[ U £2) = t; as T is an upper bound of T\ and 
T2, there exist ai and <t 2 such that (T{(t\) = (T2Q2) = t, P =>■ cr^Pj) and P =>■ <r 2 (P2). 
Consequently ^(cr^!)) = <ti(*i) and /?(<7(f 2 )) = o^fo). These equalities hold on the 
variables of ^ and of £2 that, when applied to the constraints, give ai(Pi) = p(a(Pi)) 
and <7 2 (P 2 ) = p((T(P 2 )). In conclusion P p(a(Pi A P 2 )). 

4. Let (j be the substitution such that a(t) = m. As M is a pure term, for every substitution 
77, Z^ 7 ? 0 a (t) - Vi 171 )- Thus l^r]{m) and, by definition, for every substitution p, 
p(l)^r)(m); that means, I and m are incompatible. This result is easily generalized to a 
set of constraints, T = {t\tO{h, ... , l n }}- In that case, T < M if and only if t < m 
and, for every i (1 < i < n), mjli. 

Example: The most general unifier a of the terms: 



Proof: 



T 



T 



{G(x,y,z)\(xO{H(u)}, yO{C}, zO{H(H(B)})} 
{G(P(A), y', H(z'))\(y'0{B}, z'0{C})} 



is defined by: 



<r(y) 

a(z) 



P(A) 

y' 

H(z') 



a = (a,(y'0{B,C}, z'0{H(B),C})) 
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In general, the greatest lower bound of two terms does not exist. For instance, the common 
prefixes of {A|T} and {P|T} (with A^B) are of the form {x\xOL}, but for each prefix there 
is a constant C not belonging to L and different from A and B such that {x\xOL U {C}} is 
a prefix of both terms less general than {x\xOL}. We give now a sufficient condition for the 
existence of the greatest lower bound: 

Lemma 4 LetTy = {t[\P[} andT 2 = {*2 1-P2} be two compatible constrained terms. Let(Ti{t\V\ 
t 2 ) = ti and a[ the substitutions defined by ^[{x 1 ) = x ifai(x) = x' and ^[{x 1 ) - x' otherwise. 
The greatest lower bound of 'T\ and T 2 exists and is the term T = {t\ l~l t 2 \a[(Pi) V a' 2 (P 2 )}. 

Proof: Remember that {t t n t 2 \a[(Pi) V a' 2 (P 2 )} = {ii n t 2 \Qi V Q 2 } where each Q { is the 
restriction of <r\{Pi) to the variables of t\ n t 2 and thus (Ti(Qi) - Qi- As each Pi implies Qi, T is 
a prefix of each T { . Let T' be a prefix of both T Y and T 2 greater than T. Thus T' = {*i n t 2 \P'} 
such that P' => Q1VQ2 and P; ^> <Ti(P')- Consequently, (Qi ^(Pi) ^> ^ o <7;(P') (= P'), 
and thus P'=Qi V Q 2 . 

Example: The terms T Y = {F(x,y)\xO{A},yO{B}} and T 2 = {F(B, y)\yO{A, C}} are 
compatible and T\UT 2 = {F(x,y)\xO{A},yO{»}}. Incompatible terms may also have a 
greatest lower bound, for instance {A|T} n {a;|a;0{A}} = {a;|a;0{«}}. 

Definition 7 (Restriction by a Substitution) Let T = {t\P} be a constrained term and a a 
pure substitution. The restriction ofT by a is the constrained term: 

T\ a = {t\simpl(P A tO{<r(t)})} 

Notice that restriction is defined only for a pure substitution because there is no constrained 
term in a constraint. 

Example: For the term T = {F(x, y)\xO{H(A)}} and the substitution a defined by 
a(x) = H(z), a(y) = A, we obtain: 

a(T) = {F(H(z),A)\zO{A}} and 
T\ a = {F(x,y)\F(x,y)0{F(H(Q.),A)}AxO{H(A)}} 

= {F(x,y)\xO{H(z), H(A)}} U {F(x,y)\(xO{H(A)}, yO{A})} 

Notice that in general the terms of T\ a may have common instances like the term F(A, B) in 
this example. 

To each substitution a we also associate the set & of substitutions corresponding to 
non-evaluated parts of a. 

Definition 8 Let a be a substitution. & is the set of substitutions & defined by &(x) is a(x) in 
which some non-variable subtrees different from • have been replaced by •. 
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A substitution a is considered as the semantics of an operation that defines more precisely a 
partial term T and the terms &(T) represent those instances that failed to be evaluated. This is 
why we call the set {&(T) | <re &}, the strict part of T with respect to a. The set <r(T) U T|o- is 
the calculable part of T with respect to a. 

Lemma 5 Let a be a pure substitution, Id the identity substitution and T = {t\P} a 
constrained term. T\id = 0 and cr(T) n T\ C7 = 0. The set of instances ofT is the union of 
instances of a(T), T\ a and &(T). 

Proof: By definition if rj(t) is an element of <r(T) then a(t) < rj(t) and thus \£rj(t)0{a(t)}. 
As a consequence, the two sets are disjoints and T\id is empty. The last property is proved by 
cases on the definition of C. 

The proposition below will be useful in the definition of the decomposition of a term. 

Proposition 1 For every term T and substitution a, T is the greatest lower bound of cr{T), 
T| CT and all the &{T). 

Proof: Let T = {t\tO{L}}. It is easy to prove that T is a prefix of <r(T), T\ a and all the &(T). 
Furthermore, if there exists a common prefix To of these terms that is not a prefix of T, T U To 
is also a common prefix. Now let T' = {t'\P'} be a lower bound of <r(T), T\ C7 and all the &(T) 
such that T < T'. By hypothesis on the pure terms, t <t' <t and thus t = t'. The hypothesis 
on the constraints are: 

tOL A tO{a(t)} => P' => tOL 
*(t)OL => (t(P') 
&(t)OL => &(P') 

The first property implies P'=tOL A P" and tO{a(t)} P". By lemma 2 either P"=T 
which implies P'=P or P' = Ai tO{pi(t)}. As tO{a(t)} Ai tO{pi(t)}, by lemma 2 again, 
for each i, a(t) C Pi(t). Consequently the following implications are satisfied: 

tOL A tO{a(t)} => tOL A f\ tO{pi(t)} (1) 

i 

a(t)OL => a(t)OL /\a(t)0{ Pi (t)} (2) 

i 

&(t)OL ^ &(t)OL f\ &(t)0{pi(t)} (3) 

i 

Either a(t) < pi(t) or there exists t" such that a(t) Co t" < Pi(t). In the second case, no • is 
inserted inside t otherwise t" could not be a prefix of pi(t), and t" = &(t). In both cases, as a 
consequence of lemma 2, (2) and (3) respectively imply tOL =^ tO{pi(t)} and thus P =^ P' 
that means T = T'. 
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3 Term Decomposition 

We want to partition, following an ordered list of linear terms S = (s\, . . . , s n ) named 
patterns, the set of all terms represented by T into a set of disjoint ones. The decomposition of 
T with respect to S consists to split the set associated to T into subsets such that each subset 
contains instances of at most one element of S. For example, let S = {F(A, H(£l)), F(A, £1)} 
and T = {F(x,y)\T}. The evaluable part of T is the disjoint union of T\, T2 and T3 where 



Tj = {F(A, H(x))\T}, T 2 = {F(A, y)\yO{H(Sl)}} and T 3 = {F(x,y)\xO{A}}. Constrained 



terms and decomposition were introduced in [8] to deal with recursive path orderings with 
unavoidable sets. 

3.1 Decomposition 

Definition 9 (Decomposition w.r.t. a Pattern) Let T be a constrained term, and s a pattern. 
IfT and s are unifiable with a as their most general unifier, the decomposition ofT w.r.t. s, 
denoted compat(T, s), is equal to <r(T). 

With this definition, compat(T, Q.) and T represent the same set of terms. 

Definition 10 (Decomposition w.r.t. an Ordered Set of Patterns) Let T be a constrained 
term and S = {s\, . . . , s n } an ordered set of patterns. The decomposition of T w.r.t. S, 
Decomp(T, S), is recursively defined as: 



Notice that Decomp stops when a pattern is already a factor of T because the restriction of a 
term by the identical substitution is the emptyset. The instances of T that do not belong to 
Decomp(T, S U {£2}) are those for which there is no way to decide if they are instances of one 
of the elements of S. 

Proposition 2 Let T be a constrained term, u an occurrence of T and S = {s[, . . . , s n } 
a set of patterns. Then T is the greatest lower bound of Decomp(T, {s\, ... ,s n , Q.}) U 



Proof: This property is a consequence of the definition of Decomp and of Proposition 1. 
Notice that, in the decomposition of a term, S is used as an ordered list and thus Q. is the last 
element of this list. This decomposition is a partition of the evaluable part of T. 



Decomp(0, S) 
Decomp(T, 0) 
Decomp(T, S) 



Decomp(T, {S2, ... , s n }) ifT and s\ are incompatible 

{compat(T , s [)} U Decomp(T\ ai , S) 

where a\ is the m.g.u. ofT and s\ otherwise. 



0 
0 
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3.2 Decomposition Procedure 

Let T = {t\P} be a constrained term and S = {s\ , . . . , s n } a set of patterns. 

Initialization step 

0i <- T; S <- 5 
Current step 

(r ; , <- (ai(6i), 6i\ ai ) if 0; and s { are unifiable with m.g.u. <r ; 
<- (0,0;) if not. 

Then Decomp(T, S) = {t\, . . . , r n }. 

The following lemmas will be used for the pattern matching: 

Lemma 6 Let S = {si, . . . , s n } be a set of patterns and . . . ,r n } the decomposition of 
a variable x by the set S. For every i, T{ = {si\ /\ J<; SiO{sj}} and for every i and j^i, 

Ti n Tj = 0. 

Proof: The first part is proved by induction over i: As 6\ = x, t\ = {si\T} and 62 = 
{a;|a;0{si}}. Now suppose that - /\ J<; SiO{sj}} and - {a;| /\ J<; a;0{sj}}. Then 
T i+ i = {s i+ i\ Aj<i s i+ iO{sj}} and 0 ;+2 = {a;| f\j<i x< >{ s j} A that finishes the 

proof. The second part is a direct consequence of Lemma 5. 

Example: When we take the set of patterns: 

F(x,B), F(P(y),z), F(t,u), F(H(v),w) 

The decomposition of the term T = F(x , y) gives the following set of constrained terms (in 
the examples we write xO{F} instead of xO{F(Q., ... , Q.)}): 

F(x,B), {F(P(y),z)\zO{B}} , {F(t,u)\uO{B} A tO{P}} 

If we decompose the term T = x the result is: 

F(x,B), {F(P(y),z)\zO{B}} , {F(t, u)\tO{P} A uO{B}} , {v\vO{F}} 

The strict set of x with respect to the patterns is: 

., F(x, •) , {F(.,y)\yO{B}} 

Notice that in this example redundant patterns disappear. As we decompose a variable, the 
new set of constrained patterns and the strict set of x represent the set of all the terms. 

The following lemma allows us use the decomposition of a variable to compute the 
decomposition of any term. Afterwards, the decomposition of a term is a unification with 
these new patterns as was illustrated with the previous example. 
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Lemma 7 Let S = {s\, . . . ,s n } be a set of patterns, {s[, . . . , s' n } = Decomp(x, S) and 
T = {t\P} a constrained term. Then Decomp(T, S) = {a7(s^) | 1 < i < n} where, for every 
i, Wl is the most general unifier of s\ and T. 

Proof: Let <r ; be the most general unifier of t and s; for every i (1 < i < n). By 
definition Decomp(T, S) = {{<7 ; (i)|<7 ; (P) A (Ai<j<;-i *&)<>{(? j(sj)})} | 1 < i < n}. As 
{*i(Si) I 1 < i < n} = {{ai(t)\ai(P) A (Ai<j<i-i ^(^)^{^})} \ l < i < n},in order to 
prove the lemma, it is sufficient to prove the equivalence of the constraints ai(t)0{aj(sj)} and 
ai(si)0{sj} for every integers i,j such that j < i. ai(t)0{aj(sj)}=ai(si)0{sj} if and only if 
for all substitution rj, a-j(sj) % rj o <r ; (i) if and only if Sj ^ V 0 The if part is clear. Let 

us suppose that there exists rj such that (Tj(sj) g V 0 &i{t) and Sj C rj o <t ; (s ; ) - rj o <j ; (f). 
As Sj and t are unifiable with the m.g.u. <jj, if Sj < rj o <j ; (f) there exists a substitution pj 
such that pj o crj{sj) = rj o <r ; (f). Thus crj{sj) C rj o <r ; (f) that is a contradiction. Otherwise, let 
U = {ug 0(sj) | Sj/u& and rj o <j ; (f)/u = •}. Notice that U H O(f) = 0 because f and Sj are 
unifiable. Thus Sj <rjo a-{(t)[u <— Sj/u \ ue U] which is an instance of t. Then, we conclude 
as in the previous case. 

3.3 Decomposition Normalization 

It is useful to transform constraints into an easily readable shape and we propose now a 
normalization algorithm. Its first step is to split these terms into terms with constraints with 
only one function symbol. Its second step is to normalize the constraint associated to a variable 
appearing at the same occurrence in two trees in the decomposition, in order to get the same 
constraint at common occurrences. 

Definition 1 1 A decomposition T = {T\, ... , T n } is in normal form if and only if: 

1. Each constraint occurring in each T{, has only one symbol. 

2. If there exist i^j and an occurrence u of Ti and Tj such that, for every v! <p re j\ x u, 
Ti and Tj have the same symbol at u', Ti/u = {x\xOL x }, Tj/u = {y\yOL y } then, 

L X — Ly. 

Normalization Algorithm Let {T\, . . . , T n } be a set of constrained terms, 
let T = {t\(xO{C(t[, t'J} U L) A P}. If there exists t[ non-variable: 

T {t\(xO{C(Q, ...,Q)}U£)AP} UT[x^- C(x u . . . , x n )] 

If there exist Ti and Tj satisfying hypothesis 2. above with C(Q, . . . , £2)e L y - L x and 
L x *0: 

Ti Tdu <- {x\xO{C(Q, . . . , Q.)} U L x }] U T&u <- {C(x u x n )\T}] 
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The normalization algorithm does not change the strict set of T, as it only makes substitutions 
to constrained variables. 

Example: The decomposition of the following set of patterns F(G(A), B), F(y, B), F(C, z), 
x gives as result: 

F(G(A),B) {F(y,B)\yO{G(A)}} {F(C, z)\zO{B}} {x\xO{F(£l, B), F(C, £>)}} 

We notice that the constraints over x and y have to be normalized. The first step of 
normalization transforms the patterns {F(y, B)\yO{G(A)}} and {x\xO{F(Q, B), F(C, CI)}} 
into these new patterns: 

{F(y,B)\yO{G}} {F(G(t), B)\tO{A}} 
{x\xO{F}} {F(y,z)\yO{C}zO{B}} 

The second step gives the resulting set: 

F(G(A\B) {F(G(t),B)\tO{A}} F(C,B) 

{F(y,B)\yO{G,C}} {F(C, z)\zO{B}} {F(G(t), z)\zO{B}} 

{F(y,z)\yO{G,C}zO{B}} {x\xO{F}} 

4 Pattern Matching 

In this section we use constrained terms to reason about pattern matching over pure terms. 

Definition 12 A set of patterns Yl is complete for a term N if every ground instance of N is 
also an instance of an Me IT. 

Let n = {Mi, . . . , M n } be a set of patterns. The simplest matching predicate over IT is 
defined by matchY[(t) = True if and only if there exists M;e Yl such that M{ <t where t is a 
pure term. This predicate does not take account of any priority over IT and is not suitable for 
pattern matching over partially evaluated terms. A. Laville in [6] defines a matching predicate 
over pure terms which takes care of the ordering. 

Definition 1 3 Let Yl = {Mi , . . . , M n } be a set of patterns ordered by priority, and t be a pure 
term. MatchY[(t) = True if and only if there exists i{\ < i < n) such that Mi ^ t and for 
every j < i, tjMj. 

The priority on patterns is necessary to force the matching with a chosen pattern when several 
patterns are compatible. 

With the concept of constrained terms, we replace priority by constraints. We transform 
the ordered set of patterns into an unordered set of constrained ones using the decomposition 
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algorithm and without loosing generality, we work on the evaluable part of terms with respect 
to the set of patterns. 

Let n = {Mi,.. . , M n , £1} be an ordered set of patterns and xa variable. The decomposition 
algorithm computes IT' = Decomp(a:,n) which is the set of constrained patterns M[ - 
{Mi | Aj<; MiO{Mj}}. Remember that M[ n Mj = 0 for fej and that the redundant patterns, 
represented by empty constrained terms, are eliminated. 

Definition 14 (Pattern Matching) Let n = {Mi, . . . , M n } be a set of disjoint constrained 
patterns, and T be a constrained term. RMatchjj(T) = True if and only if there exists 
i (1 < i < n) such that Mi < T. 

Notice that the relation < over constrained terms is transitive and the predicate Rmatchnis 
monotonic with the ordering False < True. 

In the following theorem, we prove that the predicate RMatchj-j/ , which only uses the prefix 
ordering, is as powerfull as Matchj-[ which uses the prefix ordering and incompatibility tests. 

Theorem 1 Let IT = {mi, . . . , m n } be an ordered set of patterns and IT' the decomposition 
of x by n. n' is the set of minimal generators of the terms satisfying the predicate RMatchjji 
and for every pure term t: 

Match n (t)=RMatch n > ({t\T}) 

Proof: By definition RMatch n /(T) = True if and only if there exists M'elT such that 
M' < T, that means T is generated by M'. Conversely each non-empty M'e IT' generates 
its ground instances. Furthermore, as the elements of IT' are incompatible they are minimal 
generators. Let IT = {M[, M^}. By definition and lemma 3, RMatch n /({i|T}) = True 
if and only if there exists M-e IT such that m; <t and for every j < i, tjrrij. We recognize 
there the definition of the predicate Matchj-[ over pure terms. 

Notice that IT generates the set of pure terms satisfying Matchr^ and gives a set of minimal 
generators more compact than the minimal set of generators described in [6], page 44. For 
instance, the decomposition of the patterns F(A, B, z) , F(A, A, z) , F(x,y, C) and F(x,y, z) 
is the set: 

F(A, B, z\ {F(x,y, C)\F(x,y, C)0{F(A, B, Q), F(A, A, G)}}, 
F(A, A, z), {F(x,y, z)\F(x,y, z)0{F(A, B, Q), F(A, A, Q.)}} 

Normalization gives the following set: 

F(A,B,z), {F(x,y,C)\xO{A}}, {F(x,y, z)\xO{A}, zO{C}}, 
F(A,A,z), {F(x,y, C)\yO{A, B}}, {F(x,y, z)\yO{A, B}, zO{C}} 

There are several algorithms to check the match of a term by a given set of patterns. We 
will use Search Trees to represent these algorithms. These trees have as labels, pairs of a 
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constrained term and an occurrence of a variable in it. The label of the root is a variable and on 
each branch the labels are terms more and more instanced. The sons of a term with label T, u 
have as term T[u <— T'] where T' contains at most one function symbol and is a prefix of a 
pattern compatible with T. The leaves of the tree are compatible with exactly one pattern (the 
occurrence is of no use). The only freedom in the construction is the choice of the occurrence 
used to develop the subtrees. 

For instance, if the choice of the occurrence is always the leftmost variable that leads to 
the pattern having priority, as it is the choice of many compilers for functional languages, the 
search tree associated to the patterns F(A, B), F(y, B) and x is: 



The strict set of the match is •, F(y, •) and F(»,B). This algorithm will not give a result 
for the term F(», A), which does not belong to the strict set of the match. 

Definition 1 5 A pattern matching algorithm is optimal if and only if it fails to produce a result 
only on the strict set of the match. 

In the following section we give a characterization for the optimality of the pattern matching 
algorithm. 

4.1 Sequentiality 

We say that a pattern matching problem is sequential when it can be computed without 
looking ahead on a sequential machine. In this section we describe how to decide if a match 
problem is sequential and in such case, how to build the search tree associated to it. This 
section adapts the definitions and proofs of [5] to the case of constrained terms. 

Definition 16 (Index, Sequential) 

Let V be a monotonic predicate on constrained terms ( with the truth values domain ordered 
as False < True). 

• An occurrence u ofT is said to be an index ofV in T if and only if 



x[x] 




1. T/u = {Q\T} 
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2. For every M>T, V(M) = True implies (M/u)^,(T/u) (i.e. (M / u)±{Q\T}). 

• Then V is sequential at T if and only if whenever V(T) = False and there exists 
M y T such that V(M) = True, it follows that there exists an index ofV in T. 

• Finally V is said to be sequential if and only if it is sequential at every calculable 
constrained term. 

As the predicate Rmatch n is monotonic, we look for its sequentiality at every term, called 
the sequentiality of EL The set Dir n (T) of the indexes of Rmatchnin T is the set of directions 
from T to IT. 

Lemma 8 Let T be a constrained term and IT a set of disjoint constrained patterns. 
ueDirn(T)ifandonlyifT/u- {Cl\T} and, for all M eU such that M ] T, onehasue 0(M) 
and M/u±{Q\T}. 

Proof: Let ueDir n (T) and Men such that M ] T. Thus, T/u = {£l\T} and there exists 
T' such that T <T' and M < T'. Suppose that u Jk O(M). There exists a proper prefix 
u' of u such that u = u'w with w^e and M/u' = {Q.\Q.OLq}. As M < T', the subterm 
T'/u' satisfies the constraints and T'/u'[w <- {£2|T}] also. Therefore M < T'[u <- {Q|T}], 
which contradicts the second condition of the definition of a direction and also our hypothesis. 
Knowing that ueO(M), obviously M/u^T /u. Conversely, if there is a term T' >T such 
that Rmatchn(T') = True, there is a pattern Men compatible with T. Thus M/u^{Q\T} 
that implies T'/u^T/u and the equivalence is clear. 

Remark: This lemma gives a simple characterization of directions. By normalization, 
a pattern Me IT is split in several terms M\, . . . , M n which may be compatible. As a 
consequence of the simplification rules, M/u^{Q.\T} if and only if each Mi/u^{Q.\T} and 
thus the set of directions Dir n (T) is the set of directions from T to the normalization of IT. 

Lemma 9 Let n be a set of disjoint constrained patterns, T = {t\P} a term and Me IT a 
pattern compatible with T. Then: 

Dir n (T) = Dir n ,{T n M) where W = {Me n | T | M} 

Proof: Suppose ueDir n /(T). Then T/u = {£2|T} and for every Men', ue 0(M) and 
M/u±{£l\T} by Lemma 8. As M belongs to IT, ue O(M), thus ue O(TnM) and (TnM)/u = 
{Q\T}. In conclusion ueDir n /(T n M). Conversely, let ueDir n /(T n M). Then for every 
Men', M/u±{Q\T} and (T n M)/u = {^|T}. Remember that {Q\QOL V QOL'} is 
always different from because • is an instance of but not of {Q.\QOL V Q.OL'}. 

Consequently T/u = {£2|T}. Now take any Me n compatible with T. Then Men' and 
M/u±{Q\T}. In conclusion, ueDir n (T). 

This property allows to look for directions only in the prefixes of patterns. 
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Theorem 2 Let IT be a set of disjoint constrained patterns. IfYl is finite, one can decide if TI 
is sequential, one just checks that Rmatchnis sequential at every prefix ofYl. 

Proof: If IT is sequential, then Rmatchnis sequential at every term T and in particular at every 
prefix of some element of II. Conversely, IT is sequential if and only if Dirn(T)^0 for all T 
such that Rmatchn(T) = False. If T is not compatible with IT, there is no instance of T which 
satisfies the predicate and thus, by definition of the sequentiality, Rmatch n is sequential at T. 
Otherwise, there exists Me IT compatible with T and Dir n (T) D Dir n (T n M) by Lemma 9. 
If Rmatch n (T n M) were True, either there would exist M'elT more general than T U M 
that would contradict the fact that Rmatch n (T) = False. Thus Rmatch n (T n M) = False 
and Dir n (T n M) ^0whichimpliesDir u (T)^0. 

Theorem 3 Optimality and sequentiality are equivalent on the pattern matching algorithms. 

Proof: Let IT be a complete decomposition. IT is sequential if and only if there exists a 
search tree in which each label (T, x) satisfies xeDiiu(T). The set of terms for which the 
algorithm does not terminate is generated by the terms T[x <— •] where (T, x) is a label of the 
search tree. By definition the algorithm is optimal if and only if the set of terms for which the 
algorithm does not terminate is generated by the strict set. Thus, we only need to prove that 
for every prefix T of IT, xe DirnCO if and only if T[x <— •] belongs to the strict set of IT. 
An occurrence u of a variable is a direction from T to IT if and only if for every pattern M 
compatible with T, M/u&{£l\T} which is equivalent to T[x <— •] is incompatible with each 
MinIT. That means T[x <— •] belongs to the strict set because IT is a complete decomposition. 

The theorems state that in order to verify the sequentiality of a match problem it is sufficient 
to verify it on the set of prefixes of the patterns, so the match is sequential if and only if the 
search tree of a variable can be built. 

We can build now a search tree for a complete decomposition IT which is optimal both in the 
number of test in each path of the tree and in the number of terms for which the algorithm 
terminates. 

SearchTree(iV, IT) = 

T where Root(T) = N and 

if there is no direction from N to IT, 
if JVg IT, e is the only occurrence of T. 
otherwise the algorithm fails 
otherwise 

let u be one direction of JV 

and L be the set {F{. . .) | 3Me Decomp(iV, IT) such that F{. . .) | M}. 
For each element Z;e L, i is an occurrence of T 
and T/i = SearchTree(iV[u <- k], IT) 
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We have extended the sequentiality to constrained terms which allows to compute optimal 
algorithms for call by pattern matching. If we complete the initial set of patterns by £1 in 
order to cover all the cases, we optimize both the success and the failure of the matching. The 
sequentiality of the set of patterns can be modified by the inclusion of the new element Q, but, 
as the search tree covers anyway all the cases, this restriction of the sequentiality has a positive 
effect on the result. 

In case of non-sequential sets of patterns, it is possible to build a search tree, by ignoring 
some of the patterns. Two possibilities appear: to ignore, during the direction search, either 
pattern with lower priority or those that prevent the existence of directions. 

5 Examples 

We wrote a prototype of this method in CAML [10] which is used to generate mechanically 
all the examples in the paper. In this prototype we only represent constraints of depth 1 , other 
constraints are normalized during the application of substitutions. In all the examples we add 
the term x at the end of the list of patterns to complete the set. 

1. With the set of patterns F(A,B), F(A,z), F(y,B) the decomposition produces the 
following constrained terms: 

F(A,B) {F(A,z)\zO{B}} {F(y, B)\yO{A}} 

{F(y,z)\yO{A}zO{B}} {x\xO{F}} 

And the strict set is: 

• , F(x, •) , F(;y) 

The nodes of search trees are pairs formed by a term and a variable which is a direction 
in the term. The arcs are labeled by the possible values the direction can take and leaves 
are represented by the matched patterns. 



x[x] 




2. For Berry's example: G(A,A,x) , G(B,y,A) , G(z,B,B) the decomposition of 
G(z, y, x) produces: 
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G{A,A,x) {G(z,y,x)\zO{A,B}yO{B}} {G(z, y, x)\yO{A, B} xO{A}} 

G(B,y,A) {G(z,y,x)\zO{B}yO{A,B}} {G(z, y, x)\yO{A} xO{A, B}} 

G(z,B,B) {G(z,y,x)\zO{A,B}xO{B}} {G(z, y, x)\zO{A} yO{B} xO{A}} 

{G(z, y, x)\zO{A} xO{A, B}} {G(z, y, x)\zO{B} yO{A} xO{B}} 

As the original patterns have no common instance, they all belong to the decomposition, 
and there is no direction to start the match. 

3. In this example extracted from a CAML program, we try to match lists of Booleans (Nil 
represents the empty list, x :: y is a list containing the element x followed by the list y). 

(y :: True :: u) , (False :: Nil) , Nil 

The decomposition of this example is: 

(y :: True :: u) Nil {(y :: z)\zO{Nil, ::}} 

{x\xO{Nil, ::}} (False :: Nil) {(y :: t :: u)\tO{True}} 

{(y :: z)\yO{False} zO{::}} 

the strict set is: 

• , y :: • , • :: Nil , y u 

and the search tree is: 



x[x] 




In the decomposition of this example, some of the patterns have the constraint 
xO{Nil, ::}. In a typed language, if Nil and :: are the only list constructors of 
lists, these patterns represent an empty set. Eliminating them (and assuming that iTrue 
implies False and that ^False implies True) the decomposition becomes: 



(y :: True :: u) , (False :: Nil) , Nil , (y :: False :: u) , (True :: Nil) 

which is the set of minimal extended patterns as defined in [6]. The search tree now 
becomes: 
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4. The sequentiality of a problem might depend on the signature of terms, for instance the 
decomposition of F(x,y) by the patterns F(A, A), F(B, B) produces: 



This problem is not sequential because of the patterns {F(x,y)\yO{A, B}} and 
{F(x, y)\xO{A, B}}. However, if the same match problem where given for a type that 
is defined with only two constants like the Booleans, those two patterns would represent 
empty sets and thus could be eliminated. In that case, the decomposition of F(x , y) by 
the patterns F(True, True), F(False, False) produces: 

F '(True, True) , F (False, False) , F (True, False) , F(F alse, True) 
And the problem becomes sequential with the search tree: 



6 Conclusion 

Constrained terms are used to extend the sequentiality to ambiguous sets of patterns. The 
introduction of an explicit symbol • to represent non-terminating evaluations allows to use 
constraints for the partially evaluated terms. 



F(A, A) {F(x,y)\yO{A,B}} {F(x,y)\xO{A} yO{B}} 

F(B,B) {F(x,y)\xO{B}yO{A}} {F(x, y)\xO{A, B}} 



With the following strict set: 



F(A, .) , F(; A) , F(», .) , F(B, .) , F(»,B) 



F(x,y)[x] 
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The actual compilers for pattern matching use different techniques to improve the code 
generated for call by pattern matching, like the introduction of heuristics for finding directions, 
or the analysis of execution tests to improve most frequent cases. Both heuristics and execution 
tests analysis become unnecessary as our algorithm computes directions and produces an 
optimal search tree that includes only unavoidable tests. 

The elements of the decomposition are exactly the leaves of the optimal search tree which 
depends inherently on the match problem. The order of complexity of the substitution and of 
the restriction is in 0(7). For the decomposition it is 0(m * I) and for the search of directions 
during the construction of a search tree it is 0(m * I) where m is the number of patterns of the 
match and I is their average size. 

The technique presented in this paper allows the implementation of optimal compilers for 
call by pattern matching in all the languages that support this feature, and encourages language 
designers to introduce it into new programming languages. 
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